1 research outputs found
Historical document analysis based on word matching
Ankara : The Department of Computer Engineering and the Institute of Engineering and Science of Bilkent University, 2011.Thesis (Master's) -- Bilkent University, 2011.Includes bibliographical references leaves 67-76.Historical documents constitute a heritage which should be preserved and providing
automatic retrieval and indexing scheme for these archives would be beneficial
for researchers from several disciplines and countries. Unfortunately, applying ordinary
Optical Character Recognition (OCR) techniques on these documents is
nearly impossible, since these documents are degraded and deformed. Recently,
word matching methods are proposed to access these documents. In this thesis,
two historical document analysis problems, word segmentation in historical
documents and Islamic pattern matching in kufic images are tackled based on
word matching. In the first task, a cross document word matching based approach
is proposed to segment historical documents into words. A version of a
document, in which word segmentation is easy, is used as a source data set and
another version in a different writing style, which is more difficult to segment
into words, is used as a target data set. The source data set is segmented into
words by a simple method and extracted words are used as queries to be spotted
in the target data set. Experiments on an Ottoman data set show that cross
document word matching is a promising method to segment historical documents
into words. In the second task, firstly lines are extracted and sub-patterns are
automatically detected in the images. Then sub-patterns are matched based on a
line representation in two ways: by their chain code representation and by their
shape contexts. Promising results are obtained for finding the instances of a query
pattern and for fully automatic detection of repeating patterns on a square kufic
image collection.Arifoğlu, DamlaM.S